Skip to content

Implement pure CLI AutoR workflow and publication packages#1

Closed
Zefan-Cai wants to merge 3 commits intomainfrom
zefan-dev
Closed

Implement pure CLI AutoR workflow and publication packages#1
Zefan-Cai wants to merge 3 commits intomainfrom
zefan-dev

Conversation

@Zefan-Cai
Copy link
Copy Markdown
Collaborator

Summary

This PR turns the branch into a pure CLI-first AutoR workflow runner with stronger workflow state management, richer platform-alignment modules, and production-oriented stage 07/08 packaging.

The branch keeps main.py as the run entrypoint, src/manager.py as the 8-stage orchestrator with human approval gates, src/operator.py as the Claude Code executor, and src/utils.py as the run-layout/prompt/validation layer. Runs still live under runs/<run_id>/, with stage drafts written to stages/*.tmp.md before validation and promotion.

TODO Status

  1. Cross-stage rollback and downstream invalidation
    Status: Done
    What landed:
  • --rollback-stage CLI support
  • downstream stages marked stale
  • rollback target marked pending/dirty
  • approved memory rebuilt from manifest after rollback
  1. Run manifest and stage status file
    Status: Done
    What landed:
  • run_manifest.json as the primary machine-readable state source
  • per-stage status, approval, stale/dirty flags, session id, attempt count, artifact pointers, handoff pointer, compressed summary
  1. Operator session recovery and failure hardening
    Status: Done
    What landed:
  • per-session state files under operator_state/
  • per-attempt state files under operator_state/
  • broken sessions are no longer reused
  • resume failure falls back to a fresh session and records attempt metadata
  • missing stage draft fallback materialization retained and integrated
  1. Stage context compression and handoff
    Status: Done
    What landed:
  • handoff/<stage>.md summaries for approved stages
  • routed orchestration context, manifest context, and handoff context injected into prompts
  1. TODO item 5
    Status: Not done
    Note:
  • The original task text provided in the thread was truncated after item 4 and before item 7, so this item was not fully visible. I did not guess and implement a partially specified requirement.
  1. TODO item 6
    Status: Not done
    Note:
  • Same reason as item 5: the original task text was truncated and the item was not fully visible.
  1. Submission-grade paper package
    Status: Done
    What landed:
  • stronger stage 07 paper package generation
  • manuscript .tex, bibliography, tables, figure manifest, build script, submission checklist, and compiled PDF placeholder artifacts
  1. Review / dissemination package
    Status: Done
    What landed:
  • stronger stage 08 release/review package generation
  • readiness checklist, threats-to-validity notes, artifact bundle manifest, release notes, external summary, and dissemination collateral generation hooks
  1. Frontend run dashboard
    Status: Not done by design
    Note:
  • A frontend/dashboard iteration was explored earlier, but the final direction for this branch was explicitly changed to pure CLI. The web stack was removed accordingly.
  1. Tests and CI
    Status: Partially done
    What landed:
  • expanded regression coverage around prompt context, KB search, rollback/stale handling, operator recovery, literature workflow, debate workflow, playbook workflow, router execution, foundry generation, and manifest consistency
    What is still missing:
  • CI wiring in GitHub Actions or equivalent

Additional Notes

  • run_state.json file dependency has been removed; run_manifest.json is now the sole persisted workflow state source.
  • src/run_state.py remains only as an in-memory compatibility formatter/adapter derived from the manifest.
  • The branch is intentionally scoped to a pure CLI main workflow rather than a web control plane.

Validation

  • python -m py_compile main.py src/*.py src/platform/*.py tests/*.py
  • python -m unittest discover -s tests -v

@black-yt
Copy link
Copy Markdown
Collaborator

Thanks for the work here. There is useful progress in this branch, but this PR should be split before merge.

Right now it is too broad to review safely: 28 changed files, ~4k additions, and several distinct concerns bundled together. In particular, it mixes:

  1. Core workflow state changes (run_manifest, rollback, stale/dirty stage tracking, CLI flags)
  2. Operator/session recovery and stage handoff/compression logic
  3. Stage 07/08 publication-package changes
  4. A large new src/platform/* stack plus knowledge_base.py / inspection.py
  5. README + test expansion

These are not one review unit. Some are core workflow changes, some are reliability improvements, some are writing-package features, and some are a substantial architectural expansion. Reviewing them together makes it hard to reason about regressions, approve only the good parts, or maintain a clear project direction.

Suggested split:

  • PR A: workflow-state layer only

    • main.py, src/manager.py, src/manifest.py, src/run_state.py, src/utils.py
    • focus on run_manifest, rollback, stale downstream invalidation, and state transitions
  • PR B: operator recovery / continuation only

    • src/operator.py + the minimal related manager changes + targeted tests
    • focus on session recovery, failed resume fallback, attempt metadata, handoff/compression if still needed
  • PR C: Stage 07/08 packaging only

    • publication package, review/dissemination artifacts, README updates relevant to that scope, and tests for that slice
  • PR D: platform modules only, if they are still desired

    • src/platform/*, src/knowledge_base.py, src/inspection.py
    • this is a major architectural addition by itself and should be reviewed independently from the CLI workflow changes

Please keep each split PR narrowly scoped, with its own motivation, tests, and validation. In the current form, this is too much surface area for one merge.

@tangxiangru
Copy link
Copy Markdown
Collaborator

@yyifan-onyan

@yyifan-Onyen
Copy link
Copy Markdown
Collaborator

Code Review

I agree with @black-yt's suggestion to split this PR — 28 files and ~4k additions across multiple unrelated concerns is too much surface area for a single review.

Beyond the split, there's a larger issue: most of the core functionality in this PR has already landed on main through other PRs.

Already on main

Feature In this PR On main now
run_manifest.json state management Yes Yes (src/manifest.py, 410 lines)
--rollback-stage + downstream invalidation Yes Yes
Operator session recovery / attempt state Yes Yes (merged via PR #12)
Stage handoff / compression Yes Yes

These portions would conflict heavily on rebase and would essentially be duplicated work.

What's actually new

  • --show-status and --kb-search CLI commands — useful, worth a focused PR
  • src/knowledge_base.py and src/inspection.py — potentially useful but need their own review
  • src/platform/* (14 files) — see below

Concerns about src/platform/*

Many of the 14 platform modules are architectural stubs rather than functional code:

  • sandbox.py (39 lines): SandboxRunner just calls subprocess.run directly — no actual sandboxing
  • security.py (55 lines): RBAC role definitions, but AutoR is a single-user CLI tool
  • messaging.py (31 lines), protocols.py (53 lines): interface definitions with no implementation
  • semantic.py (52 lines): token-overlap ranking presented as "semantic search" — not embedding-based

The modules with real substance (router.py 269 lines, literature.py 271 lines, debate.py 157 lines) are imported by manager.py but not actually used in the core _run_stage loop — the router is instantiated but the stage execution still goes through the existing ClaudeOperator path.

Merge conflicts

8 files currently conflict with main: README.md, main.py, src/manager.py, src/manifest.py, src/operator.py, src/platform/foundry.py, src/utils.py, tests/test_operator_recovery.py.

Suggested path forward

  1. Drop the already-merged portions (manifest, rollback, operator recovery, handoff) — they're on main already
  2. PR A: --show-status + --kb-search CLI commands with knowledge_base.py — small, reviewable, useful
  3. PR B: Platform modules that have real functionality (router, literature, debate) — but only if they're wired into the actual stage execution, not just imported
  4. Hold off on stub modules (sandbox, security, messaging, protocols, semantic) until there's a concrete use case driving them

@black-yt
Copy link
Copy Markdown
Collaborator

black-yt commented Apr 8, 2026

Thank you for your contribution. This PR has been superseded by your newer PR, so we are closing it.

@black-yt black-yt closed this Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants